**Step 1: load in dataset
import excel ".\Piscopo_GenderintheJournals_OriginalDataFile.xlsx", sheet("Sheet1") firstrow


** Step 2: note codebook for handcoded variables imported with the excel
* articletype
* 1 = research article
* 2 = short
* 3 = letter

* gender -- is the article coded as a gender item?
* 1= yes
* 0 = no

* sexuality -- is the article coded as a LGBTQIA+ item?
* 1= yes
* 0 = no

* Authorsex codes
* 1 = woman
* 0 = man

* methods
* please note due to resource constraints relative to the RA coding, methods codes were only added for gender and LGTBQIA+ articles
* 1 = quantitative - empirical
* 2 = qualitative  - empirical 
* 3 = normative 


** step 3: produce the results 
** REPLICATION CODE 

** proportion of research articles vs letters vs shorts, and by focus on gender and sexualtiy
tab articletype
tab gender
tab sexuality
tab gender if articletype==1
tab sexuality if articletype==1

** generate data for table 1
**Frequency of gender across all journals 
sort journal
by journal: tab gender
tab journal gender, chi2 row
tab journal sexuality, chi2 row

*exclude shorts and letters
tab journal gender if articletype==1, chi2 row
tab journal sexuality if articletype==1, chi2 row

 ** please note that once the summary statistics are generated using the tab functions, the tables are then created in Excel

** claims about methods in discussion of table 1
** 94% of gender articles are empirical:
tab methods if gender==1
** all LBGTQIA+ articles are empirical:
tab methods if sexuality==1



**generate data for table 2- author comparison

**author analysis
egen authortotal = rownonmiss(author1_sex author2_sex author3_sex author4_sex author5_sex author6_sex author7_sex author8_sex author9_sex author10_sex author11_sex author12_sex author13_sex), strok

egen womauthtotal = rowtotal(author1_sex author2_sex author3_sex author4_sex author5_sex author6_sex author7_sex author8_sex author9_sex author10_sex author11_sex author12_sex author13_sex)

tab womauthtotal if authortotal!=0
	* drop the one article for whom we don't have an authorsex, so author was calculated as zero 
	* 1639 authors are men
	* 2891-1639 authors are women -- so 1252
	
egen menauthtotal = anycount(author1_sex author2_sex author3_sex author4_sex author5_sex author6_sex author7_sex author8_sex author9_sex author10_sex author11_sex author12_sex author13_sex), values(0)

gen womauthpct = womauthtotal / authortotal

* authorship of all articles
* get solo-authored pubs by gender
tab author1_sex if authortotal==1
* get number of all-women teams 
tab authortotal if womauthpct==1 & authortotal!=1 & authortotal!=0
* get numner of all-men teams
tab authortotal if womauthpct==0 & authortotal!=1 & authortotal!=0
* get numnber of mixed teams
tab authortotal if womauthpct!=0 & womauthpct!=1 & authortotal!=1 & authortotal!=0

*authorship of gender articles
* for solo-authored pubs
tab author1_sex if authortotal==1 & gender==1
* get number of all-women teams 
tab authortotal if womauthpct==1 & authortotal!=1 & authortotal!=0 & gender==1
* get numner of all-men teams
tab authortotal if womauthpct==0 & authortotal!=1 & authortotal!=0 & gender==1
* get numnber of mixed teams
tab authortotal if womauthpct!=0 & womauthpct!=1 & authortotal!=1 & authortotal!=0 & gender==1

*authorship of sexuality articles
* for solo-authored pubs
tab author1_sex if authortotal==1 & sexuality==1
* get number of all-women teams 
tab authortotal if womauthpct==1 & authortotal!=1 & authortotal!=0 & sexuality==1
* get numner of all-men teams
tab authortotal if womauthpct==0 & authortotal!=1 & authortotal!=0 & sexuality==1
* get numnber of mixed teams
tab authortotal if womauthpct!=0 & womauthpct!=1 & authortotal!=1 & authortotal!=0 & sexuality==1

** Check for journal differences
* authorship of all articles
* get solo-authored pubs by gender
tab journal author1_sex if authortotal==1, chi2
* get number of all-women teams 
tab journal authortotal if womauthpct==1 & authortotal!=1 & authortotal!=0, chi2
* get numner of all-men teams
tab journal authortotal if womauthpct==0 & authortotal!=1 & authortotal!=0, chi2
* get numnber of mixed teams
tab journal authortotal if womauthpct!=0 & womauthpct!=1 & authortotal!=1 & authortotal!=0, chi2

* gender articles
* for solo-authored pubs
tab journal author1_sex if authortotal==1 & gender==1, chi2
* get number of all-women teams 
tab  journal authortotal if womauthpct==1 & authortotal!=1 & authortotal!=0 & gender==1, chi2
* get numner of all-men teams
tab  journal authortotal if womauthpct==0 & authortotal!=1 & authortotal!=0 & gender==1, chi2
* get numnber of mixed teams
tab  journal authortotal if womauthpct!=0 & womauthpct!=1 & authortotal!=1 & authortotal!=0 & gender==1, chi2

* LGBTQIA+ rticles
* for solo-authored pubs
tab journal author1_sex if authortotal==1 & sexuality==1, chi2
* get number of all-women teams 
tab  journal authortotal if womauthpct==1 & authortotal!=1 & authortotal!=0 & sexuality==1, chi2
* get numner of all-men teams
tab  journal authortotal if womauthpct==0 & authortotal!=1 & authortotal!=0 & sexuality==1, chi2
* get numnber of mixed teams
tab  journal authortotal if womauthpct!=0 & womauthpct!=1 & authortotal!=1 & authortotal!=0 & sexuality==1, chi2

** please note that once the summary statistics are generated using the tab functions, the tables are then created in Excel

**compare APSR editorial teams
** old team: 2017, 2018, 2019
** new team; 2021, 2022, 2023
** exclude 2020 as transitional year

**make time1 and time2 variables
gen time1 = 99
recode time1 99 = 1 if year==2017
recode time1 99 = 1 if year==2018
recode time1 99 = 1 if year==2019
recode time1 99 = 0
tab time1
tab year
gen time2 = 99
recode time2 99 = 1 if year==2021
recode time2 99 = 1 if year==2022
recode time2 99 = 1 if year==2023
recode time2 99 = 0
tab time2

**have one variable for article content
gen content = 99
recode content 99 = 1 if gender==1
recode content 99 = 2 if sexuality==1
recode content 99 = 0
tab content

** table 2
tab content if time1==1 & journal=="APSR"
tab content if time2==1 & journal=="APSR"

** chi2 test
** old team: 2017, 2018, 2019
** new team; 2021, 2022, 2023
** exclude 2020 as transitional year
gen APSRold = 99
recode APSRold 99 = 1 if year==2017 
recode APSRold 99 = 1 if year==2018
recode APSRold 99 = 1 if year==2019
recode APSRold 99 = 0
recode APSRold 0 = . if journal!="APSR"
recode APSRold 1 = . if journal!="APSR"
recode APSRold 0 = . if year==2020

tab APSRold gender, chi2 row
tab APSRold gender if articletype==1, chi2 row
tab APSRold gender if articletype==1 & year!=2017 & year!=2021, chi2 row
tab APSRold gender if year!=2017 & year!=2021, chi2 row

**figure 3 and interpretation
tab journal gender if time1==1, row chi2
tab journal gender if time2==1, row chi2

** please note that once the summary statistics are generated using the above functions, the figure is then created in Excel

**Appendix
**Appendix A1
* gender articles
* for solo-authored pubs
tab journal author1_sex if authortotal==1 & gender==1 & time2==1, chi2
* get number of all-women teams 
tab  journal authortotal if womauthpct==1 & authortotal!=1 & authortotal!=0 & gender==1 & time2==1, chi2
* get numner of all-men teams
tab  journal authortotal if womauthpct==0 & authortotal!=1 & authortotal!=0 & gender==1 & time2==1, chi2
* get numnber of mixed teams
tab  journal authortotal if womauthpct!=0 & womauthpct!=1 & authortotal!=1 & authortotal!=0 & gender==1 & time2==1, chi2

** please note that once the summary statistics are generated using the tab functions, the tables are then created in Excel